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ABSTRACT - 

Previous research has studied the Effects of 
different methods of .item option^.weighting on the reliability and 
concurrent and predictive validity of achievement tests. Increases in 
reliabilitj are generally found/bu't with mixed results for validity. 
Several methods of producing option weights^ (i«e«, Guttman internal 
and external weights and,.ludges.« -weights) and their effects X)n 
reliability and concurrent, pr e diet ijre, "and face validity, were * 
examined. Option weights to maximize ^reliability produced . ' 
crossvalidated increases in Hoyt reliability over ^rights-on^y scoring 
(•82 v^fisus .58 Respectively), decreases in correlations, with other 
achievement tests, little changes in predictive validity, and a loss 
in face validity (i.e. some- correct options had lower weights than 
incorrect optipns) . Height^ to maximize validity did not. 
crossvalidate and led ^o a reduction in reliability and mixed 
validity results. Judges weights pi^oduced increases in relia^bility 
and mfxed' resiats with* validity. The size of Gut^mah weights was 
shown to interact wUth item option and test cl^aracteri sties: It was 
doncl'Uded tha^t option weighting offered limited if any improvement 
'over unit weighting.. (Author/GDC) ' * * ^ 



i 



^^^(♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦'^^♦♦♦♦♦♦♦♦♦♦♦♦♦♦^ 

♦ Rejpro duct ions supplied by EDRS are the best that can be made ♦. 

♦ ' from the original document.- , ♦ 
^^^^^^^(^^♦♦^^♦♦♦^♦♦♦♦♦♦♦♦♦♦^(♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦^ 



^ ERIC 



O 

in 



y$ OEPAR/MeNTOpM^LTH. . 
^ EDUCATtON 4W6LFARE 
NATIONAL INSTITUTE OF 
EOUCATlON^ 

THIS DOCUMENT MAS- BEEN REPRO- 
OUCEDf EXACTLY AS RECEIVED FROM 
THE PERSON OR ORGANIZATION ORIGIN' 
ATINGIT POINTS OF VIEW OfMiPmiONS 
STATE* DO NOT NECESSARILY REPRE- 
' SENT OFFICIAL NATIONAL INSTITUTE OF 
EDUCATION POSITION OR POLICY 



Iteia Option Weighting of Achievement Tests: 
Comparative Study of Methods 
Ron|il(! G. Downey 
Temple University % 



r 



"PERMISSION TO REPRODUCE THIS 
MATERIAL HAS 'BEEN GRANTED BY 

TO THE EDUCATIONAL RESOURCES 
" INFORMATION CENTER (ERIC) AND 
US£RS OF THE ERIC SYSTEM^"' 



00 



CO 

o 

■ -tr 

ERIC 



Rimning head: Item Option Weighting 

■ 2. ' . • 




A 



Item Optioit Weighting 
1 



Abstract 



^ I^revious^ research has studied the effects of different methods of item 
option weighting on the* reliability and concurrent and .predict ive,^yalidity. 
. of achievement tests . yGeneralTy increases in reliability are found but with 
mixed results for validity. This research attempted to interrelate «ever^l 

methock^ of producing option weights, (i.e., Guttman ^internal and external 

. * # » "* 

4 

weights. 4nd judges' weights) and examined their eff ects: pn reliability and • 

' ~ * ' • . . \ • 

concurrent, predictive, and face validity. .Option weights to maximize 

reliability" produced crossvalidated^ (N = 974) increases in Hoyt reliability 

'over rights-only scoring. (.82 versus .58 respectively), decreases" in correlations 

with other,, achievement tests, lit^e changes in predictive validity, and a 

loss in face Validity (i;e. some, correct options had lower weights than incorrect 

options) . Weights to maximize validity did not^ crossvalidate and led to a 

rfeHuction in reliability, and mixed validity results. Judges weights produced 

^ / ■ ' , ' ' ■ * ■ • . 

increases in reliability and mixed results with validity. The size of Guttman . - 

''^ weights, were shown to interact vith item option and test characteristics'. 

Jt ,was concJ.uded that option we'i^b^ing of fere^' limited if any improvement 

over unit weighting. ^ , ' ^ 
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Item Option Weighting of ' /ahi6vement Tests: 
Comparative, Study of Methods 



Current scoring systems for muXtiple-choice Achievement test items are'^ 
;based x>n assumptions-atout the nature of the individual's response(s) to an 
item. The major assumptions df , "all-or~none" knowledge, random incorrect 
responses and elqual option distractability have, been frequently criticized 
(Cureton, 1966; Davis,, 1967;'Lord, 1963; Stanley, 1954; . and Willey, 1960)'. 

A variety of earlier research efforts have beert directed at the development bf 
methode for differential "item opti-qn weighting" for achievfem^nt .tests .which 

are not based upon the above assumptions about the nature o^ r&ponses (see 

« ■ ' ' 

Stanley and Wang, 1970). Nedelsky (1959) conducted one of the earliest studies 

in this 'area and found' th|t a test, ^utilizing a worst-di'stractor weighting 

procedure, was more r1ejiabl-e than a rights only score* Lord O^ote 1) also 

f6und partial support for the worst-distractor procedure/ Davis ar^ Fifer (1959) 

used three different option weighting -procedures; .the* cdi:;:e€flat ions between the^""**^ 

item option and total test ^core as weights, judges^eigbts , and weights as 

suggested by Flanagfen (1935)... Their findings indicated the use of option _ ^ 

weights generally increased reliability^dt not validity to predict teacher 

ratings (al«o see Davis 1959). SabersT and White (1969) also found results 

similar to those of Davis and, Fifer. * * ' ^ 

Recently an increasing number of studies have been conducted using either 
a variant of a method originally suggested by Guttman (1941) ^or an elaboration/ 
of the Davis, and Fifer (1959) metho'd of judges weight's. Hendrickson p971,. 
Note 2) conducted a study with the Scholastic Aptitifde Test using the" weigljting 
method suggested by Guttman and*found subistantial increases in reliability and 
lower intercorrelations of the verbal and Quantitative subtegts. -'Reilly and Jackson 
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(1973) and Reilly (1975) used similar procedures with^ the Graduate Record 

^- - - J ■ ■ ' - . ' ^ * - ■ . 

Examination and again foond increases in reliability, a tendency for lower 

intercorrelatipns between subtests, and lower validity coefficients with , 

undergraduate gtade point averages (backward prediction), keilly (1975) * 

presented som^ evidence that weighting of omitted items produced' undesirable 

results. Waters (1976) using empirical weighting procedures similar to, Davis 

^'and Fifer ((f959) also found increased reliability 'and decreased'^intgrco^^elations . 

with other measures. Hendrickson (197J) has suggested, that theseNresults can be 

explained if one as&umes that the weighted test is more factorally pure vhich 

would. lead to increased reliability and less overlap with other measures. 

" Hambleton, Roberts, and Traub (1970), Patnoik knd T^aub (1973) and Kansup 

and Hakstiao (1975) §11 used a variant .of option weighting where weights were 

derived by expert judges and ol^ained similar results (viz. increased internal 

reliability but mixed results for predictive validity) . Kansup an,d, Hakstian 

(1975) haVa'^made a strong appeal fpr dropping research on4tem option weighting^i. , 

due to the inability to prove its value and the preponderance of evidence against it, 

Wl^ile the above studies of item option- weighting have generally 

found moderate to substantial increases in reliability, the question of 

changes' in validity has been less clear. Most studies have found that correlations 

with other similar achievement tests have ''decreased which Would follow from 

the concept that^ the test is becoming more factorally "pure" (see Hendrickson, 

'"1971). Hendrickson (Note 2) referred to this as quasi-validity. There is a - 

need to both produce more evidence regar^Jfing both concurrent (quasi-) and 

predictive validity, as well as to compare the two separate lines of research 

using Gujttman and judges' weights. . ' . 
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;The present study vas designed to inv.e^tigate;the comparative effects on 
^reliability and concurrent (quasi-), predictive, and face validity of item 
opti#n ^^ighting procedures. Three different methods *of option weighting were 
u^ed. The method o*f^*"recip^ocal avei?ages" (Lawske and Harris, 1958^ and Baker 
and Hoyt, Note 3) awas used to^ derive Guttman weights for ma^^imlzing reliability 
(interna! consistency), and Guttman weights A7ere used to maximize validity . 
In adctition judges* option' weights were developed. Xh§3€"were compared with 



the- conventional rights only scoring; 



Method ' . ''^ 



Subjects '^ * 



The sample was composed of 1,950 entering freshman college students at 
Temple University. The total sample was randoiply split into two groups of 
approximately equal >size (976 *in the experimental group and *974 in the cross- 
validation sample) . AA.11 empirical weights were derived with the experimental , 
grpup and comparison^ were made /On the results from the cross-validation sample 
Due to ditf^rent course placements which were based on the test to weighted, 
some individuals did not have tiriterion scores, (see Tahle 2)- 
Procedures « ^ ; • . *^ 

Vhe test used was the Cooperative English Test, English Expression CEduca- 
.tional Test Service, Note' 4)\ Only thfe Eff ecl;iveness section was used, a 
thirty item test , on the ability to determine intended meaning, v^he. concurrent 
validity measures- used wereL verbal and quantitative scores of the Scholastic 
Aptitude Test. The English grades' for the fia;st two seme^ers pf college 
EnglisK^ere used as the measures for predictive validity. 

Weighting Schemes . ♦ * ' ' ' , - 

In addition to the conventional weights, three J^ypes of weighting schemes* 
"were uaed in this study.. "KIo of* the schemes, were based on the method proposed 
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by Guttman (1941)^ and the third was/- based ''op the assumption that eitperts can 

•assign meaningful weights to options based on the .amount* of * correct (or incorrect) 

information contalt^d in the option. . x ' 

' . Guttman proposed a method which weighted the option (or category) by using 

the mean -criterion score of the individuals selecting that optioti. Guttman 

assumed that fehe valufe he wanted fd achieve was one which minimized individual 

variability over a group of subjects. This mj^nimization was' accomplished by 

I maximizing a correlation coefficient "represented by the ratio betweeny.the' 

variance among Subjects! and the total variance. Guttman went on tVshoW that 

the set of weights sat;isfying this requirement are proportional to the mean 

score of tbe individuals! selecting ^n option (c.f. Gottman,. 1941 , p. 341). , ' 

Weights- derived from thel reciprocal averages procedure were only approximations 

of the fihal Guttman weiahts. therefore, *^the procedure was iterated several 

times using the derived weights to rescore^the test and recalculating new 

weiglits until the weights 'were stabilized. Using Lord^s (1958) nomenclature 

the sources of 'variance in a test Can be set out as follows. Let X be. 

' • * ' ci^ 

. ^ *' 

the scoring- weight of opti<)n c for item i (m = number of items) and N equal 

• ' *■ ' 

the number of subjects. Let y^^ be* the spore obtadned by an individual £ on 



item i, so that y^^- =^ X whenever person at ^chooses option c. Therefore, * • 

y^ *^y!i=l ^ai ^^^^ tptal score of person £), y. ^ =^^_j y^^ (total score ofj# 

' ' . ^ p ' / " 

,the item) and y ^ ^± (grand total). The item-person matrix . 

and the Analysls^^af- Variance Table (Table* 1) will help explain these sources 

^6f variance. 



Insert Table 1 about here 
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Guttman defined MS as equal to zero, ^he solution to the' component 
, * > *' * — •* 

^ analysis is therf ^ maximize the correlation Y| ^, where 

. ' ^ ^ 

MSp + E is substituted' fof -T and -the equation redi>ced/ thei 

This fct^ula is equivalent to maximizing the between person variance and 
minimizing the error term. This solution is also equivalent to maximizing 
'the^Hoyt (1951) reliability which is found by the following formula: 
" * ^ff "" ^ -'E/MS , wtfere. r / equals reliability. The ratio E/MS^ is - 
common to both solutions. 

The procedure used to develdj^' weights was as follows^: if x ' equals 

cis 



the iterated weights at iteration s, then 

' ^ a' - J (yea. T y^j) . 

^cis N . 

ci 



In this formula N " equals the ^number of persons marking option c and y ' 

I 1 ca • 

^als'the totar score of 4» person marking option c. The subtraction of V t4ien 

* \ * ' ai ; 

removed the bias for the item being u^ed.^The Guttman procedure to maximize - 

reliability began (iteration 1) with y^^ eoual to\he*total conventional 

• • • ca • ^^^..^^M^^^^M 

score. Each iteration (2 through 9) used y Jyf x ...to develop a new 

" "^ca. y^l cl(s-l^ ^ 

set of weights. All groups were itelfat^ nine times and weights from the ninth 
iteration were used 'for cross-validation. • 

'Guttman did not restrict nis method to^values determined' by internal , 
weighting; as Stanley a!^d Wang (1970) have suggested, other scores could be used, 
, to deve:|op weights. The sepond Guttman weighting procedure used English 001 
grades a^ the score (y^^ ) and produced a set of weights maximizing the differences 
between subjects receiving different grades. Only one iteration was perfom^ed 
for this procedure. ,Both the Guttman weighting methqds treat omitted items as 
valid options and* therefore, veights were derived 'fox *them. Preliminary * 
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results indicated that the s^res were positively skewed and^ therefore all 
weighted (Suttman total test scores were normalized. - 

The tHird weighting procedure \ias one suggested by^ Davis and/Fifer (1959). 
Weights applied wei^e 'determined by having English teachers rate the various 
options as to. the amount of correct and/or incorrect information displayed 
tiy a person choosing this option. Seven instructors in tbe English department- ^ 
were asked to rate thfe options. Below are 'th^. directions given them fpr 
'making theit judgmeh|:s: , ^ * * 

vlt is generally agreed that when multiple-choice examinations are ^ . 
used, options for a particular question vary in their degree of , 
correctness. You are bei,ng asked to rate options on the English 
Expression portion of the 'Cooperative English Teft as to their , 
degree of correctness. Due to^ the length of the task, only the^ 
^ first part (30 items, Effectiveness) , of ^the, Expression portion* 
^ will be treated in this, manner, This mean^ that since you are ♦ * 

rating each option (4) of every question (30) there will be 
120 ratings. ' . " ^ 

For each pption you should rate it in terms* its degree'of 
correctness along the following scale of 1 to 7*; Mark a (1) ^ . 
if the option, is incorrect; mark a (2) or (3) i,f thfe options 
are partJLally, dncorrect; mark a (4) if the option is partially 
incorrect and parjtially. correct; mark a (5) or (6) if the option 

• . is partially correct;- and mark a (7) if the option is correct. 

In ratinfe' the* options, you should determine the amount of ° * 

" correct .and/or incotrect information a ^respondent would have 
to have available in order to mark the option as the right answer . 

The weights applied were th^ mean of weights assigned by the seven instru^ors. 

The Effectiveness test was then scored using these weights'y^ 

Analysis ' ' , , — • 

As a check against biased selection procedures, t^ tests were made on differenc 
between variables for the experimental and cro£Sr32alid^tion samples. Hoyt 

(1951) reliability estimates were deijived. Esfcimafies of the predictive 

* - ♦ 

validity were the zero order correlation coefficients between test scores for 
each type o'f Wi^hting procedure and English grades.' Concurrent validfty'was 
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the zero order correlations beUween SAT-^V ^and Q) and test scores for each 
procedure, * Since only comparative results, between methods and not t;he level 
of prediction, w^s of major concern,^ adjustments for restrictions in range 
on the English grades were not made. All comparisons between methods were 
made* on the cross-validation sample. ' ^ ■ 

* Results • ^ ' * • ^ 

Table 2 presents the means, standard deviations and numbers of subjects 
for th$ four criterion scores and ^or the conv/entional test score for the 
experimental and cross-validation sampfes. The t^-tests between the two samples 
for each oj^ the variables, also presented in Table 2,' did not show ahy 



significant differences < 




Insert Table 2 about here • * 
/ ^P* 1 1 

Table 3< summarizes the reliability, and* validity coefficients for the 
experimental group for each of the four. weighting methods ♦ Table 4 sum- 
marizes the results for the cross-validation satnp^3.e. While reliability and 
vali^dity are separate poncepf.ts, they hav.e been found to interact and they 
will, fchereforerj be discussed jointly (see Lord and Novick, 1968 and Tucker, 
1946 for a discussion of "the attenuation paradox'^). A further complexity* 

was the face validity of the weight for tY^e correctT option. If the procedui;es 

^ .... 
'produced the highest weight for the correct option then the item (and extended 

ov^r items to the test) was considered to have face validity, " , 



Insert Tables 3 and 4 about here 
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Using only the conventional procedure as the comparative base line the 

' • ■ ' '4- • • - ' ' ■ 

Guttman internal weiglYtlng procedure produced test scores (see Table 4) which 
we;re more reliable (.82),' tended to have a lower correlation with the SAT 



scores, but h§^ only alj|ioderate to negligibly' effect upon prediction of 

English grades. Twenty-one items out of thirty received the highest item' 

option weight (positive)^ for the correct .option.^ 

• • • 

..•For th^^ights derived to maximize predictive validity, it can be seen 
from Table 4 that reliabiUit^ dropped (.45 versus .58. for the convefitional group) 
^ the correlations with SAT dropped, and' finally the predictive validity for first 

/ 

semester grades' did not change, but prediction of second semester grades improved. 
Less than hal^ of the thirty items had the highest weighti;Jor the correct 

i • " " . . ^ ^ 

^ optJLon. ^ 

/ • . - • • •• . ^ 

Table 4 shows a slightly different pattern for the judges weights. 

Judges weights produced a slightly more reliable test (r = -.66) with little 
. ^ : t * 

^change in thfe concurrent validity, a moderate increa^e^n the prediction 

of first semester English grades, and a moderate decrease for *the second semester. 

It sh05;uld be not;ed that the results for judges ftom the' experimental groOp are 

independent of the weighing procedure ^f or 'judges and indicated no changes in 

predictive validity (see Table 3)., The judges produced the highest weight ' 

* for the correct optipn for all thirty ^items. 

• _^ ■ . ^ : ^ ^ ^' 

^ * ' Discussion and Conclusion . 

> . . ' ""^^ 

, The results frpm this study are similar to previous findings indicating . ^ 

that an internal weighting procedure can pr^uce a more reliable test which has 

a 'lower relationship with other similar measures (see HendrickS, Note 1, 

Rellly and Jackson, 1973, and Waters, 1976). But this procedure produced 
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» ^ /-if 

little if any improvement In predictive validity and at a much' 'greater admin,^ * 

^ ' ' • • \ . • 

iStratlve cost and lower face -validity. The weights derived by maximizing 

validity produced a lesns reliable test with only a hint that validity would be 

^P^^^Y^^- Weighting foj: increases. in validity had high administrative costs 

with a I9SS in face ^validity. ' The judges, weights produced the/most positive 

results with' moderate increase\^ in reliabili£y^nd a moderate increase in 

predictive^validity. With the exceptioYi that the testM«iuld generally have to 

^ be scored Wy computer, the costs for Oe^^^oping ,the judges'' weighting ^JToce'dure 
• . - ' - ^ * 

are smal^. " ' ^ _ . , ^ . 

^ Several more general points* should, be made. Flrst^ the "attenuation . . " 

pajradox'' was in general supported, \^it4i increases in reliability' not producing* 

increases in validity and increases in validity not being stable and lov^erin^ 

\ - • * , 

reliability. Second, the empirically derived weights produced undesirable 

side effects ^iTirtli incorrect item pptiofis having hlgtl|r weights than correct - 

t ^ - _ „^ ' ' ^ 

^ options. Third, the Guttms. procedures for deriving -weights had other 
undesirable side effects including large ^negative weights, large. Weights 
'assigned to omissipns, and skewed scor^e ^stid>uti^ns-. --AlinosJi^al^ these 
effects upon the option* weights are t<h^^rect^^^^sult^o£ 

lationship between the^ option difficulty and 'the size ^ the weight." Since the 

sum otthe weights for options in an itpm was 3^et equal t^ zero, low ^difficulty 

options will have- small weights approbcfhing. zero. A coi^llary of .-this rule is 

that high difficult options (includlnfe^^gilfetied items) will tend to have ^relatively 

large weights due to the possibility that a highly selected group' or individual^ 

responded to that option. The findings, therefore', suggest *=*that Guttman 

^ option weightings interact with the item and test characteristics. 

• . The r^sultsu, do not support the use of either of the Guttman procedures 

. " * * 

"\ ' ' " "'12 .: ' ' ■ . 
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. .for option weighting because of tlie high cpSts, associated with the minimum 
■gainsj The judges weighting procedure showed- the most promise .for prodyping a" 
more reliab].e and valid test. The consistency of the results using o{)tion . - 
weighting methods suggests .that it is becoming clear that option weighting 
offers only limited improvement -over the conventional method of tinlt weighting. 
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, • Footnotes 

^'^is study was conducted as part of a dissertation under the direction 
ofc Prof ess'or Harold C. Reppert^ in partial fulfillment of the requirements for 
' tl\e doctoral' degree at Ten^ple University , 'Philadelphia;' Penfta* Portions of 
. this article were presented at the American Psychological Association Meeting, 

- ^ ' ^ - ' 

Washiji|ton, D. C. , September, 1976. ^ ^ " . 

• ' " % . . V ' 

Reqiiests for reprints shoyld be sent to Ronald G. Downey, Center for 

{I • * ^ *, 

Student Development, Kansas State University, Manhattan, Kai^sas' 66506» 

The computer program used to obtain thexjuttman solutions was 
originally developed by J. Hendrickson at Johns^Hopkins and bnly. minor 
modifications wer-e Iha^e in' her system. » ' • 
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Item by Person Matrix and Analysis of V|riance ^ 
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Table 2 

Comparisons of .Experimental and Cross-Validation Groups 
Summary Statistics and t-tests 



Variable 



Group ' 

Experimental, , -Cross-Valida'tion • 



t-'test 



SAT-Verbal 



X 

S.D. 
N • 



525.26 
81.40^ 
84r 



/528.82 



80.11 

844' 



.765 



SAT-Quantitative 



Grades-lst Semester 



X 

S.D. 

N 



SrD. 

N 



^40.38 
78.05 
844^ 



3.34 



744" 
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■ 76.63 

844 



3\36 • 
. .88, 
738^ 
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- 6U 



3.43 
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Test Score 
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< 



X 

S.D. 

N 



2/. 08 
3.73 
976 



20.07 
,3.62 
• 974 



■.090'. ^ 



^ower.N is due to missing data. 
Lower N is due to placement procedures. 
*TjoVer N is due to placement procedures and drop» outs . 
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Experimental Group-Summary of Reliability 
and'Validit;y Coefficients for Jlach 
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V . Cross-validation Group -Summary of Reliability 
and Validity Coefficients for Each ^ 
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